Variational theory for site resolved protein folding free energy surfaces 
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We present a microscopic variational theory for the free energy surface of a fast folding protein 
that allows folding kinetics to be resolved to the residue level using Debye- Waller factors as local 
order parameters. We apply the method to A-repressor and compare with site directed mutagenesis 
experiments. The formation of native structure and the free energy profile along the folding route 
are shown to be well described by the capillarity approximation but with some fine structure due 
to local folding topology. 

PACS number: 87.15 -v 

Proteins fold on a configurational energy landscape that has the shape of a funnel Q . As the protein moves down 
the funnel towards the native state, incomplete cancellation of the entropy and energy losses may result in free energy 
barriers. So far, proteins that fold fast exhibit single exponential kinetics Q], consistent with a free energy profile 
that has a single highest barrier along the progress coordinate. Central issues are the origin of the free energy barrier 
for fast folding proteins and how the ensemble of structures which represent the bottleneck is to be characterized. 
We address these questions using a variational approximation that describes ensembles of partially folded proteins 
at the highest level of resolution, i.e., the specific role of individual residues in guiding the protein to the native 
state is quantified. In the laboratory, Fersht has developed a probe of the transition state or bottleneck ensemble 
through protein engineering kinetic studies in which the sequence of the protein is altered by replacing residues one 
at a time ||. The experiment yields the fraction of the time that the mutated site is in the native conformation in 
the bottleneck ensemble by comparing folding rates of the mutant to the wild type. Since this can be done for any 
residue in the sequence, these studies are inherently "site resolved" . Resolving the transition state ensemble to this 
level is one way to monitor the average of the many routes taken as proteins fold. 

Previous analytic mean field theories and simulations have produced energy landscapes in one or two global di- 
mensions characterizing the folding ensemble Q|. We develop here a free energy profile for proteins with a funneled 
landscape that is completely "site resolved", i.e., one dimension per residue, by extending the mean field variational 
calculations presented in j5j. The underlying Hamiltonian explicitly incorporates chain stiffness and connectivity 
while the approximation employs a variational density that monitors local order parameters for folding akin to the 
Debye- Waller factors (also called temperature factors) for individual residues seen in X-ray crystallography. 

The basic Hamiltonian for an interacting polymer chain is H = -ff c hain + -ffint where -ff c hain is backbone potential 
and _ff; n t are the interactions between distant monomers along the chain. i? c hain is an effective harmonic potential 
/3-ffchain = l/2Xi r i • ^ ij " r j + B r i where {r^} are the positions of the N a-carbons, and (3 = 1/fcgT is the inverse 
temperature. The first term enforces the chain connectivity while the second term confines the radius of gyration to 
a reasonable value (achieved by fixing B to a small constant: B = 3/2a 2 x 10 -3 ). For the connectivity matrix, Tij, 
we use the well known Gaussian approximation to the freely rotating chain derived in Denoting the i th bond 
vector by &i — (rj + i — rj) and the angle between successive bond vector by 8, this stiff chain model is defined by the 
correlations (a^ • a^ + ;) = a 2 g l , where a is the mean bond length and g = cos#. Following Bixon and Zwanzig, is 
determined by inverting these correlations and transforming to the bead representation resulting in a pentadiagonal 
matrix that depends on the stiffness parameter g; (for the explicit matrix, see |||). In the limit g — > 0, describes 
the standard flexible chain, whereas g — > 1 corresponds to a rigid rod. The persistence length, /, for this chain is given 
by I = a/(l — g). We use g = 0.8 giving (with a = 3.8A) I w 20A, the persistence length for poly-L-alanine JtJ. 

Wc take interaction between distant monomers along the chain to be restricted to specifically native- like interactions 
-ffint = ^iju(\vi — rj|) where the isotropic pair potential, has a minimum at a non-zero distance produced by 

summing three Gaussians, u(r) = r y s e^^ BT +7ie~^ r — r yie~P' r ; the short- and intermediate-range terms are repulsive 
while the long-range Gaussian is attractive (f3 s > /?, > Pi). The sum over pair interactions ) is restricted to 

native contacts. This constraint gives a smooth funnel shaped energy landscape, appropriate for fast folding proteins. 
This is an extreme realization of the principle of minimum frustration [|J and is reminiscent of the lattice model 
originally introduced by Go |9) . The heterogeneity of the interaction between different residues is reflected by the 
strength ey. Non-native interactions can also be included in H lni and treated by our variational method. 
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To study the many dimensional free energy surface denned by H, we choose local order parameters that can 
characterize the ensemble of partially folded structures by specifying the temperature factor for each residue, Bi. This 
describes the mean square fluctuations of a residue about its native position and for fully folded proteins has been 
measured. A similar local order parameter for folding has been used in lattice simulations [ flO| and earlier analytical 
work ||. Consider the free energy surface defined by the set of scalar fluctuations of each residue from its native 
position {rf}, Bi — (r< — ) 2 . The free energy surface ^[{B,}] for an ensemble specified by {Bi} is given by 



= T r 



•:>•, r,V), 



(1) 



= JVXTre- 0H[x] , (2) 

where f3H[X] = (3H + £ X l (B l - (n - rf ) 2 ), and VX = \[. d\j/2m. 

Denoting the integrand in Eq.(||) by e~^ F ^- x \ we approximate F[X] with the help of a reference Hamiltonian Hq 
and the Gibbs-Bogoliubov variational expression F[X] « —ksTlogZa + (H[X] — i?o)o ; where Zq — Tr [e — ^"l, and 
(• • -)o means the average with respect to Hq. The reference Hamiltonian describes a Gaussian chain constrained 
to fluctuate about the native structure {rf } by a harmonic external field: (3Hq = -ff c hain + J2^i( r i ~ r i V ) 2 - The 
variational parameters {Ci} are conjugate to {Bi}. This reference Hamiltonian captures the two stable phases of fast 
folding proteins: the globule with small {Ci} and the native state with uniformly large {Ci}. {Ci} also form a set 
of local order parameters for folding. A similar (but more elaborate) reference Hamiltonian was used to determine 
the phase diagram for proteins with a rugged energy landscape as well as to study folding free energy barriers [g]. 
These mean field studies employed a global order parameter for nativeness by setting all Ci equal in one region of 
the protein. Different related effective harmonic variational Hamilitonians have been employed to study polymers in 
random media [jll | , random directed polymers [jl2| , and random copolymers ]T^ ] . 

Evaluating Eq.(2() using the steepest descents approximation gives the stationary condition Bi = ((r^ — rf r ) 2 )o as a 
function of {Ci} leading to a one to one relation between the {Bi} and {Ci}. Technically, it is more convenient to study 
the free energy surface in {d} space so that we consider the variational free energy surface expressed as F[{Ci}] = 

E — ST with the estimate for the energy E — J2 e ij( u ( r ij))o an d tne entropy S/ks — logi?o + J2(Ci( r i — r i V ) 2 }o 
as a function of {Ci}. 

Since Hq is quadratic, Z and all of the averages are expressible in terms of the correlations, (r^ ■ rj) — (rj)o • (i"j)o = 
3/2Gy , with Gij = [1/2 ITy ■ + (B + d)8ij] . Zq involves the determinant of the correlation matrix while the 
averages can be calculated using the density of site i, Pi(r) — (S(r — i*j)}o, and the pair density between sites 
i and j, Pij{r) — (S(r — i*y))o. In terms of the average position Sj = Ylj GijCj T f these densities are Pi(r) = 

(7rGii)~ 3/2 exp [-(r - s i ) 2 /G l i] , and p i:j (r) = (Tr6G i:j y 3/2 exp [-(r - s^/AGy] where Sdj = G ri + G 33 - 2G{j. 

Since both F[{Ci}] and V cF[{Ci}] can be expressed analytically in terms of Gij (which is calculated numerically), 
it is relatively easy to locate the minima and saddle points numerically | fl4f . Once the transition states and the folded, 
unfolded, and local minima are determined, we define the average folding route to be the connected steepest descents 
path from each transition state to the neighboring minima. Only the global minimum of ^[{Ci}] is rigorously an 
upper bound but the saddle points and local minima should also be good estimates for the true free energy surface. 

We now apply the model to the folding of the A-repressor protein. Ag-85 is a good candidate system since it is small 
(80 residues) and folds extremely rapidly in 20 /i-sec following two-state kinetics |l5). Recently Oas et al. probed 
the structure of the transition state ensemble of A@-85 by comparing the folding rates measured with NMR for seven 
mutants made by alanine to glycine replacements ]T^ |. The folding rates can be connected to the structure of the 
transition state by the (f> parameter developed by Fersht || , cf> = A log kf/A log K (fc/ is the folding rate and K denotes 
the equilibrium constant). (f> ~ indicates that the conformation of the mutated residue in the transition state is 
similar to the globule, whereas <f> ~ 1 suggests that this residue has native structure in the transition state ensemble. 
Based in part on these ^-values Table [j], Oas proposed that helices HI and H4 are structured in the transition state 
ensemble. While a more extensive mutation study is necessary to characterize fully the transition state ensemble, a 
comparison to these results is a strong test for the theory presented here. 

We define native contacts between residues with /3-carbons within a distance of 6.5 A (a-carbons for glycines) in 
the native structure |T^ ] that are separated by at least four monomers in sequence. The pair distribution function 
of the distance between a-carbons is used to constrain the parameters of the effective pair potential. We find the 
intermediate- and long-ranged interaction parameters (7^, f3ia 2 ,ji, (3ia 2 ) — (9.0, 0.8, 6.0, 0.4) give an effective potential 
well that contains all the native contact distances and has a minimum at the most probable C Q — C Q contact distance, 
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r* = 1.6a, with u(r*) = — 1. The short-range interaction represents the hard-core repulsion between residues and 
gives excluded volume; with the choice of values (7^, (3 s a 2 ) = (25.0, 4.5), the repulsion roughly balances the attractive 
energy in the globule state allowing us to study a folding transition that occurs directly from a random coil (i.e., near 
the theta temperature). We will compare the free energy surface using a homogeneous contact strength (e^ = eo) 
with that of the full 20 letter Miyazawa-Jernigan contact energies pll in which contact between different residues 
have different energies. 

We now consider a low energy folding route on the free energy surface connecting the globule and native minima. 
Fig. [I] shows the free energy along this path at the folding transition temperature, Tf, plotted as a function of the 
fraction of energy stabilization relative to the native state, Enorm- The stationary points of the free energy surface 
form a broad barrier with a reasonable height (5 — IksTf) for fast folding proteins. The barrier for the homogeneous 
case of all equal interactions is approximately 30% larger than for the heterogeneous contact energy model. Another 
difference between the two models is that we find many more transition states and local minima (not shown) for the 
heterogeneous case. These arise from the competition between contacts of different strengths. 

The Debye- Waller factor (temperature factor) of each residue contains structural information of the stationary 
points along the folding route. The temperature factors plotted versus sequence number at four of the saddle points 
along the folding route for the heterogeneous model are shown in Fig. |[ Even the globule already has some structure 
though its fluctuations are large. Comparing these curves progressively from the globule to the native state, we see 
that the barrier at TS\ is described by the formation of helices H4-5 while the central region of helix HI which docks 
with H4 is partially localized but with substantial fluctuations. This suggests that the stabilizing contacts between 
H4-5 and HI are due to the general increase in density rather than any very strong contacts between specific residues. 
Following this is the completion of helix H5 and the center of helix HI, while helices H2-3 remain relatively disordered 
at TS3. Lastly, helices H2-3 become increasingly more ordered along this folding route as indicated by the temperature 
factors at TS4. 

The folding route described above agrees with the conclusions of the Oas group namely, helices HI and H4 
are structured in the transition state ensemble, whereas helices H2 and H3 are unstructured. The 0-values obtained 
from this calculation makes the comparison more precise. As in the experimental analysis, we assume the ensemble 
of structures do not change but recalculate the free energy for each mutant at the saddle-points. Using k ~ e~@ AF , 
we calculate <p at each saddle-point and their average over the four transition states. The results are given in Table 
H The agreement with experiment is quite reasonable in light of the rough approximations made in modeling the 
experiment. The worst agreement is for the mutation M20. This is a surface residue with no tertiary contacts by our 
definition, thus other terms in the energy may be contributing. Some obvious improvements to this model such as 
explicit hydrogen bonding and many body forces can easily be made, but our aim here is to explore simplest model 
that give a physically reasonable and direct picture of the folding route for fast folding proteins. From this point of 
view, the agreement with experiment is very encouraging. 

Examination of the average folding route also leads to a simple physical picture for the barriers under the thermo- 
dynamic conditions of folding at Tf directly from the random coil. The progression of the folding of the 3D structure 
is shown in Fig.^j, where the sites of the native structure are colored according to the fraction of energy gained at 
that site. The first bottleneck involves partial structure formation in approximately 40% of the chain (in helices 
H4-5). Subsequently, a picture much like that of the growth of an ordered phase in an ordinary first order transition 
emerges with a front of progressive ordering crossing the protein. This is reminiscent of the capillarity theory [[l9[ . 
Within the capillarity picture, one imagines an ordered region that is completely folded separated by a sharp inter- 
face from a completely unfolded region. At Tf, the free energy of progressively forming folded structure is given by 
/cap = j(—Nf + N^ 3 ) where Nf is the fraction of native residues, and 7 is the surface energy cost. As shown in 
Fig. [I], this equation provides a good fit to the stationary points in both the homogeneous and heterogeneous models, 
identifying Nf with normalized energy E^ otm (defined in Fig. [I]) and treating 7 as a fitting parameter. 

Superimposed on the average behavior of the profile are fluctuations representing the fine structure arising from 
inhomogeneity of the local folding free energy. It is obvious that these fluctuations arise for the heterogeneous model 
because of varying interaction energies, but are still present for the pure homogeneous Go like model. This shows 
the high free energy intermediates Eo 20 along the average folding route for a very funnel-like surface are mostly 
determined by the folded topology. Within the capillarity picture, the smaller barrier for the heterogeneous case can be 
interpreted as being due to wetting, as expected for the random field Ising model pjj . The thermodynamic conditions 
studied here favor the capillarity picture with a sharp interface. When folding occurs from an already collapsed state 
the free energy difference of the bulk unfolded and folded phases is smaller leading to a broader interface. The basic 
formalism can be used for this other regime as well. 

We would like to thank Ben Shoemaker for helpful discussions. This work was supported by NIH Grant No. PHS 



3 



R01 GM4557. S.T. was supported by the Japan Society for Promotion of Science. 



[1] P. E. Leopold, M. Montal, and J. N. Onuchic, Proc. Natl. Acad. Sci. U. S. A. 89, 8721 (1992); J. D. Bryngelson, J. N. 

Onuchic, N. D. Socci, and P. G. Wolynes, Proteins: Structure, Function and Genetics 21, 167 (1995). 
[2] W. A. Eaton et ai, Curr. Opin. Struct. Biol. 7, 10 (1997). 
[3] A. R. Fersht, A. Matouschek, and L. Serrano, J. Mol. Biol. 224, 771 (1992). 

[4] S. S. Plotkin, J. Wang, and P. G. Wolynes, J. Chem. Phys. 106, 2932 (1997); J. N. Onuchic, N. D. Socci, Z. A. Luthey- 
Schulten, and P. G. Wolynes, Folding and Design 1, 441 (1996); Z. Guo, C. L. Brooks, and E. M. Boczko, Proc. Natl. 
Acad. Sci. U. S. A. 94, 10161 (1997). 

[5] M. Sasai and P. G. Wolynes, Phys. Rev. A 46, 7979 (1992); S. Takada and P. G. Wolynes, Phys. Rev. E 55, 4562 (1997); 
S. Takada and P. G. Wolynes, J. Chem. Phys. 107, 9585 (1997); 

[6] M. Bixon and R. Zwanzig, J. Chem. Phys. 68, 1896 (1978). 

[7] C. R. Cantor and P. R. Schimmel, Biophysical Chemistry (W. H. Freeman and Co., New York, 1980), Vol. 3, p. 1013. 
[8] J. D. Bryngelson and P. G. Wolynes, Proc. Natl. Acad. Sci. U. S. A. 84, 7524 (1987). 
[9] N. Go, Annu. Rev. Biophys. Bioeng. 12, 183 (1983). 
[10] V. Pande and D. S. Rokhsar, Nature Struct. Biol. , (submitted). 

[11] S. F. Edwards and M. Muthukumar, J. Chem. Phys. 89, 2435 (1988); J. D. Honeycutt and D. Thirumalai, J. Chem. Phys. 
90, 4542 (1989). 

[12] M. Mezard and G. Parisi, J. Phys. I(France) 1, 809 (1991); T. Garel and H. Orland, Phys. Rev. B 55, 226 (1997). 

[13] A. Moskalenko, Y. A. Kuznetsov, and K. A. Dawson, Physica A 249, 353 (1998). 

[14] D. J. Wales, J. Chem. Phys. 101, 3750 (1994). 

[15] R. E. Burton et ai, J. Mol. Biol. 263, 311 (1996). 

[16] R. E. Burton et ai, Nature Struct. Biol. 4, 305 (1997). 

[17] L. J. Beamer and C. O. Pabo, J. Mol. Biol. 227, 177 (1992). 

[18] S. Miyazawa and R. L. Jernigan, J. Mol. Biol. 256, 623 (1996). 

[19] J. D. Bryngleson and P. G. Wolynes, Biopolymers 30, 177 (1990); A. V. Finkelstein and A. Y. Badretdinov, Folding and 

Design 2, 115 (1997); P. G. Wolynes, Proc. Natl. Acad. Sci. U. S. A. 94, 6170 (1997). 
[20] M. Silow and M. Oliveberg, J. Mol. Biol. 269, 611 (1997). 
[21] Villain, J. Phys. I(France) 46, 1843 (1985). 



Mutant 


M15 


M20 


M37 


M49 


M63 


M66 


M81 


(Helix) 


(HI) 


(HI) 


(H2) 


(H3) 


(H4) 


(H4) 


(H5) 


0Exp 


0.5 


1.0 


0.2 


0.3 


0.8 


1.2 


0.6 


(0)calc 


0.3 


0.3 


0.1 


0.2 


1.0 


1.0 


0.7 



TABLE I. <f> — values for A-repressor. 
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FIG. 1. (color) The free energy at the stationary points along the folding route as a function of the normalized en- 
ergy for the homogeneous (orange, O) and inhomogeneous (black, □) models. For an ensemble with average energy E, 
Enorm = (E — Eg) /(En — Eg), where En and Eg are the energies of the native state and globule state, respectively, 
/cap (described in the text) is also shown as the solid line with 7 = 35(42) ksTf for the heterogeneous(homogeneous) model. 
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FIG. 2. (color) The temperature factors (i.e., mean square fluctuations relative to the average position of i th monomer, Si) 
plotted as a function of sequence number for the heterogeneous model at the stationary points shown in Fig.[j]. The bar at the 
top indicates the helical secondary structure (H1-H5). 
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FIG. 3. (color) The 3D native structure of A-repressor colored according to the normalized energy of each site, 



(a 



)/(ef r — ef) with = y"/ . Uj{u(rij))o, evaluated at each saddle-point (clockwise from upper left), ef and ef 



are the energies of the i site in the native state and globule state, respectively. 
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